AITopics | online sgd

From Information to Generative Exponent: Learning Rate Induces Phase Transitions in SGD

Neural Information Processing SystemsJun-18-2026, 16:22:24 GMT

To understand feature learning dynamics in neural networks, recent theoretical works have focused on gradient-based learning of Gaussian single-index models, where the label is a nonlinear function of a latent one-dimensional projection of the input. While the sample complexity of online SGD is determined by the information exponent of the link function, recent works improved this by performing multiple gradient steps on the same sample with different learning rates -- yielding a non-correlational update rule -- and instead are limited by the (potentially much smaller) generative exponent. However, this picture is only valid when these learning rates are sufficiently large. In this paper, we characterize the relationship between learning rate(s) and sample complexity for a broad class of gradient-based algorithms that encapsulates both correlational and non-correlational updates. We demonstrate that, in certain cases, there is a phase transition from an "information exponent regime" with small learning rate to a "generative exponent regime" with large learning rate. Our framework covers prior analyses of one-pass SGD and SGD with batch reuse, while also introducing a new layer-wise training algorithm that leverages a two-timescales approach (via different learning rates for each layer) to go beyond correlational queries without reusing samples or modifying the loss from squared error. Our theoretical study demonstrates that the choice of learning rate is as important as the design of the algorithm in achieving statistical and computational efficiency.

artificial intelligence, machine learning, sample complexity, (14 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Government (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

0525a72df7fb2cd943c780d059b94774-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 07:35:55 GMT

artificial intelligence, machine learning, offline sgd, (19 more...)

Neural Information Processing Systems

Country: Europe > France (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Smoothing the Landscape Boosts the Signal for SGD Optimal Sample Complexity for Learning Single Index Models

Neural Information Processing SystemsApr-24-2026, 05:31:37 GMT

We focus on the task of learning a single index model σ(w x) with respect to the isotropic Gaussian distribution in d dimensions. Prior work has shown that the sample complexity of learning w is governed by the information exponent k of the link function σ, which is defined as the index of the first nonzero Hermite coefficient of σ.

artificial intelligence, gradient descent, machine learning, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.30)

Add feedback

0525a72df7fb2cd943c780d059b94774-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 09:06:12 GMT

offline sgd, sgd, tail index, (17 more...)

Neural Information Processing Systems

Country: Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.87)

Add feedback

Smoothing the Landscape Boosts the Signal for SGD Optimal Sample Complexity for Learning Single Index Models

Neural Information Processing SystemsDec-27-2025, 16:54:36 GMT

The outline of our paper is as follows.

algorithm, gradient descent, sample complexity, (11 more...)

Neural Information Processing Systems

Country:

Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.05)
North America > United States (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models

Neural Information Processing SystemsDec-23-2025, 17:01:17 GMT

We focus on the task of learning a single index model $\sigma(w^\star \cdot x)$ with respect to the isotropic Gaussian distribution in $d$ dimensions. Prior work has shown that the sample complexity of learning $w^\star$ is governed by the information exponent $k^\star$ of the link function $\sigma$, which is defined as the index of the first nonzero Hermite coefficient of $\sigma$. Ben Arous et al. (2021) showed that $n \gtrsim d^{k^\star-1}$ samples suffice for learning $w^\star$ and that this is tight for online SGD. However, the CSQ lower bound for gradient based methods only shows that $n \gtrsim d^{k^\star/2}$ samples are necessary. In this work, we close the gap between the upper and lower bounds by showing that online SGD on a smoothed loss learns $w^\star$ with $n \gtrsim d^{k^\star/2}$ samples. We also draw connections to statistical analyses of tensor PCA and to the implicit regularization effects of minibatch SGD on empirical losses.

learning single index model, name change, optimal sample complexity, (6 more...)

Neural Information Processing Systems

Country: Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.27)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes

Jagannath, Aukosh, Jones-McCormick, Taj, Sarangian, Varnan

arXiv.org Machine LearningNov-7-2025

We develop a high-dimensional scaling limit for Stochastic Gradient Descent with Polyak Momentum (SGD-M) and adaptive step-sizes. This provides a framework to rigourously compare online SGD with some of its popular variants. We show that the scaling limits of SGD-M coincide with those of online SGD after an appropriate time rescaling and a specific choice of step-size. However, if the step-size is kept the same between the two algorithms, SGD-M will amplify high-dimensional effects, potentially degrading performance relative to online SGD. We demonstrate our framework on two popular learning problems: Spiked Tensor PCA and Single Index Models. In both cases, we also examine online SGD with an adaptive step-size based on normalized gradients. In the high-dimensional regime, this algorithm yields multiple benefits: its dynamics admit fixed points closer to the population minimum and widens the range of admissible step-sizes for which the iterates converge to such solutions. These examples provide a rigorous account, aligning with empirical motivation, of how early preconditioners can stabilize and improve dynamics in settings where online SGD fails.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

2511.03952

Country:

North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
(4 more...)

Genre: Research Report (0.50)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

From Information to Generative Exponent: Learning Rate Induces Phase Transitions in SGD

Tsiolis, Konstantinos Christopher, Mousavi-Hosseini, Alireza, Erdogdu, Murat A.

arXiv.org Machine LearningOct-27-2025

To understand feature learning dynamics in neural networks, recent theoretical works have focused on gradient-based learning of Gaussian single-index models, where the label is a nonlinear function of a latent one-dimensional projection of the input. While the sample complexity of online SGD is determined by the information exponent of the link function, recent works improved this by performing multiple gradient steps on the same sample with different learning rates -- yielding a non-correlational update rule -- and instead are limited by the (potentially much smaller) generative exponent. However, this picture is only valid when these learning rates are sufficiently large. In this paper, we characterize the relationship between learning rate(s) and sample complexity for a broad class of gradient-based algorithms that encapsulates both correlational and non-correlational updates. We demonstrate that, in certain cases, there is a phase transition from an "information exponent regime" with small learning rate to a "generative exponent regime" with large learning rate. Our framework covers prior analyses of one-pass SGD and SGD with batch reuse, while also introducing a new layer-wise training algorithm that leverages a two-timescales approach (via different learning rates for each layer) to go beyond correlational queries without reusing samples or modifying the loss from squared error. Our theoretical study demonstrates that the choice of learning rate is as important as the design of the algorithm in achieving statistical and computational efficiency.

artificial intelligence, machine learning, sample complexity, (14 more...)

arXiv.org Machine Learning

2510.2102

Country:

North America > Canada > Ontario > Toronto (0.14)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Industry: Government (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback

Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models

Neural Information Processing SystemsMay-26-2025, 14:51:51 GMT

We focus on the task of learning a single index model \sigma(w \star \cdot x) with respect to the isotropic Gaussian distribution in d dimensions. Prior work has shown that the sample complexity of learning w \star is governed by the information exponent k \star of the link function \sigma, which is defined as the index of the first nonzero Hermite coefficient of \sigma . Ben Arous et al. (2021) showed that n \gtrsim d {k \star-1} samples suffice for learning w \star and that this is tight for online SGD. However, the CSQ lower bound for gradient based methods only shows that n \gtrsim d {k \star/2} samples are necessary. In this work, we close the gap between the upper and lower bounds by showing that online SGD on a smoothed loss learns w \star with n \gtrsim d {k \star/2} samples.

artificial intelligence, learning single index model, machine learning, (4 more...)

Neural Information Processing Systems

Country: Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.29)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.44)

Add feedback

Feature learning from non-Gaussian inputs: the case of Independent Component Analysis in high dimensions

Ricci, Fabiola, Bardone, Lorenzo, Goldt, Sebastian

arXiv.org Machine LearningMar-31-2025

Deep neural networks learn structured features from complex, non-Gaussian inputs, but the mechanisms behind this process remain poorly understood. Our work is motivated by the observation that the first-layer filters learnt by deep convolutional neural networks from natural images resemble those learnt by independent component analysis (ICA), a simple unsupervised method that seeks the most non-Gaussian projections of its inputs. This similarity suggests that ICA provides a simple, yet principled model for studying feature learning. Here, we leverage this connection to investigate the interplay between data structure and optimisation in feature learning for the most popular ICA algorithm, FastICA, and stochastic gradient descent (SGD), which is used to train deep networks. We rigorously establish that FastICA requires at least $n\gtrsim d^4$ samples to recover a single non-Gaussian direction from $d$-dimensional inputs on a simple synthetic data model. We show that vanilla online SGD outperforms FastICA, and prove that the optimal sample complexity $n \gtrsim d^2$ can be reached by smoothing the loss, albeit in a data-dependent way. We finally demonstrate the existence of a search phase for FastICA on ImageNet, and discuss how the strong non-Gaussianity of said images compensates for the poor sample complexity of FastICA.

artificial intelligence, fastica, machine learning, (16 more...)

arXiv.org Machine Learning

2503.23896

Country:

Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Filters

Collaborating Authors

online sgd

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

From Information to Generative Exponent: Learning Rate Induces Phase Transitions in SGD

0525a72df7fb2cd943c780d059b94774-Paper-Conference.pdf

Smoothing the Landscape Boosts the Signal for SGD Optimal Sample Complexity for Learning Single Index Models

0525a72df7fb2cd943c780d059b94774-Paper-Conference.pdf

Smoothing the Landscape Boosts the Signal for SGD Optimal Sample Complexity for Learning Single Index Models

Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models

High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes

From Information to Generative Exponent: Learning Rate Induces Phase Transitions in SGD

Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models

Feature learning from non-Gaussian inputs: the case of Independent Component Analysis in high dimensions